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Abstract 

We give necessary and sufficient conditions for both square inte- 
grability and smoothness for densities of a probabiUty measure on a 
compact connected Lie group. 



1 Introduction 

The study of probability measures on groups provides a mathematical frame- 
work for describing the interaction of chance with symmetry. This subject 
is broad and interacts with many other areas of mathematics and its appli- 
cations such as analysis on groups [TS], stochastic differential geometry [B], 
statistics [5] and engineering 

In this paper we focus on the important question concerning when a 
probability measure on a compact group has a regular density with respect 
to Haar measure. We begin by reviewing work from [T] where Peter- Weyl 
theory is used to find a necessary and sufficient condition for such a measure 
to have a square-integrable density. This condition requires the convergence 
of an infinite series of terms that are formed from the (non-commutative) 
Fourier transform of the measure in question. We also describe a related 
result from [2] where it is shown that square-integrability of the measure is a 
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necessary and sufficient condition for the associated convolution operator to 
be Hilbert-Schmidt (and hence compact) on the L^-space of Haar measure. 

In the second part of our paper we turn our attention to measures with 
smooth densities. A key element of our approach is the important insight of 
Hermann Weyl that the unitary dual G of the group G can be parameterised 
by the space of highest weights. This effectively opens up G to investigation 
by standard analytical methods. We introduce Suguira's space of rapidly 
decreasing functions of weights which was shown in [18] to be topologically 
isomorphic to C°°{G). We are then able to prove that a probability measure 
has a smooth density if and only if its Fourier transform lives in Suguira's 
space. This improves on results of [3] where the Sobolev embedding theorem 
was used to find sufficient conditions for such a density to exist. 

In the last part of the paper we give a brief application to statistical 
inference. In [13J, Kim and Richards have introduced an estimator for the 
density of a signal on the group based on i.i.d. (i.e. independent and iden- 
tically distributed) observations of the signal after it has interacted with an 
independent noise. To obtain fast rates of convergence to the true density, 
the noise should be in a suitable "smoothness class" where smoothness is 
here measured in terms of the decay of the Fourier transform of the measure. 
We show that the "super-smooth" class is smooth in the usual mathematical 
sense. 

2 Fourier Transforms of Measures on Groups 

Throughout this paper G is a compact connected Lie group with neutral 
element e and dimension B{G) is the Borel cr-algebra of G and V{G) is 
the space of probability measures on (G,i3(G)), equipped with the topology 
of weak convergence. The role of the uniform distribution on G is played by 
normalised Haar measure m G V{G) and we recall that this is a bi-invariant 
measure in that 

mi^Acr) = m{aA) = m{A), 

for all A G B{G), cr G G. We will generally write m{da) = da within integrals. 

Our main focus in this paper is those p G 'P(G) that are absolutely con- 
tinuous with respect to m and so they have densities / G L^{G, m) satisfying 

p{A) = jj{a)da, 

for all A G B{G). 

A key tool which we will use to study these measures is the non-commutative 
Fourier transform which is defined using representation theory. We recall 
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some key facts that we need. A good reference for the material below 
about group representations, the Peter- Weyl theorem and Fourier analysis of 
square- integrable functions is Faraut [7]. 

If is a complex separable Hilbert space then U{H) is the group of 
all unitary operators on H . A unitary representation of G is a strongly 
continuous homomorphism tt from G to U{Vt^) for some such Hilbert space 
VJr. So we have for all g,h & G,: 

• T^{gh) = ■K{g)n{h), 

• 7r(e) = (where is the identity operator on Kr,) 
. n{g-^) = n{g)-' = n{gy. 

TT is irreducible if it has no non-trivial invariant closed subspace. Every 
group has a trivial representation 6 acting on VJ5 = C by 6{g) = 1 for all 
g & G and it is clearly irreducible. The unitary dual of G,G is defined to 
be the set of equivalence classes of all irreducible representations of G with 
respect to unitary conjugation. We will as usual identify each equivalence 
class with a typical representative element. As G is compact, for all vr G 
G, (^Tr := dim(14) < cxd so that each Ti{g) is a unitary matrix. Furthermore 
in this case G is countable. 

For each tt G G, we define co-ordinate functions T^ij{(y) = 7r(cr)jj with 
respect to a some orthonormal basis in 

Theorem 2.1 (Peter- Weyl) The set {y/d^irij,! < i,j < d.„,Ti G G} is a 
complete orthonormal basis for L'^{G, C). 

The following consequences of Theorem 12.11 are straightforward to derive 
using Hilbert space arguments. 

Corollary 2.1 For f,ge L\G, C) 

• Fourier expansion. 

f = Y,^dMm^), 

Tree 

where /(tt) := f{a~^)Ti{a)da is the Fourier transform of f. 
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The Plancherel theorem. 



ITT) 

T-eG 



where \ \\ ■ \ \\ is the Hilbert- Schmidt norm \ \\T\\ \ := tr{TT*)2. 
The Parseval identity. 

{f,g) = J2dMm9i^ 



.17^) 



If /X G V{G) we define its Fourier transform // to be 

/i(7r) = / 7r(a^^)/i((icr), 



JG 

for each it E G. For example if is a Dirac mass at e then = I-k 

and m(7r) = -[^^^-If/^ has a density / then /i = / as defined in 

Corollary 12.11 If we take G to be the d-torus T'^ then G is the dual group Z'^ 
and the Fourier transform is precisely the usual characteristic function of the 
measure fi defined by 'f2{n) = J^^ e~^"''^fi{dx) for n G Z'^, where ■ is the scalar 
product. Note that any compact connected abelian Lie group is isomorphic 
to T'^. 

Fourier transforms of measures on groups have been studied by many 
authors, see e.g. [T2|, [TOl [9|, [16] where proofs of the following basic properties 
can be found. 

For all /i, /ii, /i2 G V{G), n E G, 

1. /2r*7r2(7r) = /r2(7r)/il(7r), 

2. /2 determines /i uniquely, 

3. ||//(7r)||oo < 1, where || ■ ||oo denotes the operator norm in 14 . 

4. Let G N) be a sequence in V{G). /i (weakly) if and only if 
'il^i'K) -> /i(7r). 

Remark. Most authors define /i(7r) = 7r(cr)/i((icr). This has the ad- 
vantage that Property 1 above will then read /ir^7^2('7i") = f'^i{,'^)P^2{,'^) but 
the disadvantage that if /i has density / then //(tt) = /(tt)*. It is also worth 
pointing out that the Fourier transform continues to make sense and is a 
valuable probabilistic tool in the case where G is a general locally compact 
group (see e.g. [TOt [9| [T6].) 
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3 Measures With Square-Integrable Densities 



In this section we examine the case where /i has a square-integrable density. 
The following result can be found in [1] and so we only sketch the proof here. 

Theorem 3.1 /i has an L"^ -density f if and only if 

5^d.|P(7r)||r<oo. 

TreG 

In this case 

/(^) = '^d^tr(j2{n)'K{a)) a.e.. 

TreG 

Proof. Necessity is straightforward. For sufficiency define g := X^TreG c^Trtr(/i(7r)7r). 
Then g G L^(G,C) and by uniqueness of Fourier coefficients ^(vr) = /i(7r). 
Using Parseval's identity, Fubini's theorem and Fourier expansion, we find 
that for each h G C(G,C): 

/ /i(cr)(7(a")(ia = (i^tr(/i(7r)/i(7r)*) = / h{a)fi{da). 
Jg ^ Jg 

TreG 

This together with the Riesz representation theorem implies that g is real 
valued and g{a)da = ii{da). The fact that g is non-negative then follows 
from the Jordan decomposition for signed measures. □ 

See [Ij for specific examples. We will examine some of these in the next 
section from the finer point of view of smoothness. 

To study random walks and Levy processes in G we need the convolution 
operator in L'^{G,C) associated to G V{G) by 

{TJ){a) := / f{aT)f,{dT), 
Jg 

for / G L^(G,C),cr G G. For example is the transition operator cor- 
responding to the random walk n G N). The following properties are 
fairly easy to establish. 

• is a contraction. 

• Tfj_ is self-adjoint if and only if fi is symmetric, i.e. = fi{A~^) for 
all A G B{G). 

The next result is established in [2]. 
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Theorem 3.2 The operator is Hilbert- Schmidt if and only if // has a 
square-integrable density. 

Proof. Sufficiency is obvious by the Hilbert-Schmidt theorem. For neces- 
sity, suppose that Hilbert-Schmidt. Then it has a kernel k E LF'iG x G) 
and 

{T,f){a)= I f{a)k,{a,T)dT. 

JG 

In particular for each A G B{G), 

fi{A) = T^lA(e) = / kf,{e,T)dT. 

J A 

It follows that fi is absolutely continuous with respect to m with density 
f = Kie,-). □ 

Let > 0) be a weakly continuous convolution semigroup in V{G) 

and write Tt := T^^. Then {Tt,t > 0) is a strongly continuous contraction 
semigroup on L'^{G, C) (see e.g. [Ill [TOl [H E].) 

Corollary 3.1 The linear operator Tt is trace-class for allt>0 if and only 
if fit has a square-integrable density for all t > 0. 

Proof. For each t > 0, if /i^ has a square-integrable density then Tt = TiTi 
is the product of two Hilbert-Schmidt operators and hence is trace class. 
The converse follows from the fact that every trace-class operator is Hilbert- 
Schmidt. □ 

If for t > 0,fit has a square-integrable density and is symmetric, then 
by Theorem 13.21 Tt is a compact self-adjoint operator and so has a discrete 
spectrum of positive eigenvalues 1 = e~*^^ > e~*^^ > ■ ■ ■ > e"*''" — ?■ as 
n — >■ oo. Furthermore by Corollary 13. ![ Tt is trace class and 

oo 

Tr(Tt) = J2 < oo. 

n=l 

Further consequences of these facts including the application to small time 
asymptotics of densities can be found in [21 [3] . 

4 Sugiura Space and Smoothness 

In this section we will review key results due to Sugiura [IB] which we will 
apply to densities in the next section. In order to do this we need to know 
about weights on Lie algebras and we will briefly review the necessary theory. 



6 



4.1 Weights 



Let g be the Lie algebra of G and exp : g ^ G be the exponential map. For 
each unitary representation tt of G we obtain a Lie algebra representation dir 

by 

7r(exp(iX)) = e^'^<^^ for all t e R. 
Each d'K{X) is a skew-adjoint matrix on and 

dT^(\X,Y]) = [dTT{X),d7r{Y)], 

for all X.Y E g. A maximal torus T in G is a maximal commutative subgroup 
of G. Its dimension r is called the rank of G. Here are some key facts about 
maximal tori. 

• Any a & G lies on some maximal torus. 

• Any two maximal tori are conjugate. 

Let t be the Lie algebra of T. Then it is a maximal abclian subalgebra of g. 
The matrices {d7r{X),X G t} are mutually commuting and so simultaneously 
diagonalisable, i.e. there exists a non-singular matrix Q such that 

Qdn{X)Q-' = diag(^Al(X), . . . , ^A,^(X)). 

The distinct linear functionals are called the weights of tt. 

Let Ad be the adjoint representation of G on g. We can and will choose 
an ylrf-invariant inner product (-, ■) on g. This induces an inner product 
on t* the algebraic dual of t which we also write as (•,•). We denote the 
corresponding norm by | ■ | . The weights of the adjoint representation acting 
on g equipped with (•, •) are called the roots of G. Let V be the set of all 
roots of G. We choose a convention for positivity of roots as follows. Pick 
V e t such that P n {r/ G t*; r]{v)} = 0. Now define V+ = {a E V; a{v) > 0}. 
We can always find a subset Q C V+ so that Q forms a basis for t* and every 
CK e P is an linear combination of elements of Q with integer coefficients, all 
of which are either nonnegative or nonpositive. The elements of Q are called 
fundamental roots. 

It can be shown that every weight of tt is of the form 

A^TT = A^ - ^ riaa 

where each is a non-negative integer and A^r is a weight of tt called the 
highest weight. Indeed if /Xjr is any other weight of tt then |//7r| < IAtt]. The 
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highest weight of a representation is invariant under unitary conjugation of 
the latter and so there is a one-to-one correspondence between G and the 
space of highest weights D of all irreducible representation of G. We can thus 
parameterise Ghy D and this a key step for Fourier analysis on nonabelian 
compact Lie groups. In fact D can be given a nice geometrical description 
as the intersection of the weight lattice with the dominant Weyl chamber, 
but in order to save space we won't pursue that line of reasoning here. From 
now on we will use the notation d\ interchangeably with (i^ to denote the 
dimension of the space K- where vr G G has highest weight A. For a more 
comprehensive discussion of roots and weights, see e.g. |8] and ^17j . 

4.2 Sugiura Theory 

The main result of this subsection is Theorem 14.11 which is proved in [IB] . 

Let M„(C) denote the space of all n x n matrices with complex entries 
and M.{G) := Uagd ^^(i(A)(C). We define the Sugiura space of rapid decrease 
to be S{D) := {F : D ^ M{G)} such that 

(i) F(A)GMrf(A)(C)forallAG A 

(ii) lim|A|^oo |A|'=|||F(A)||| = for all ken. 

S{D) is a locally convex topological vector space with respect to the semi- 
norms \\F\\s = sup^g^, |A|''|||F(A)|||, where s > 0. We also note that C°^(G') 
is a locally convex topological vector space with respect to the seminorms 
\\f\u = sup^g(5 \ Uf{cr)\ where U G W(g), which is the universal embedding 
algebra of g acting on C°°{G) as polynomials in left-invariant vector fields 
on G, as described by the celebrated Poincare-BirkhofF-Witt theorem. 

Theorem 4.1 [Sugiura] There is a topological isomorphism between C°°{G) 
and S{D). 

We list three useful facts that we will need in the next section. All can 
be found in [TSj. 

• Weyl's dimension formula states that 

^ naeP+(P>a) 

where p := ^J2aev+ celebrated "half-sum of positive roots". 

From here we can deduce a highly useful inequality. Namely there 
exists > such that 

dx < N\Xr (4.1) 

where m := = — r). 
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Sugiura 's zeta function is defined by 



\eD-{o} 
and it converges if s > r. 



• Let {Xi, . . . , Xd) be a basis for G and let A G W(g) be the usual 
Laplacian on G so that 

d 

where {g^^) is the inverse of the matrix whose {i,j)th component is 
{Xi,Xj). We may consider A as a linear operator on L'^{G) with do- 
main G°°{G). It is essentially self-adjoint and 

for all 1 < i,j < d-j^^TT E G, where tt 7^ 5 ^ > 0. The numbers 
(ktt, 71 E G} are called the Casimir spectrum and if A,r is the highest 
weight corresponding to vr G G then 

From here we deduce that there exists C > such that 

\K\^ <K^<C{l + \K\^). (4.2) 

4.3 Smoothness of Densities 

We can now establish our main theorem. 

Theorem 4.2 /i G V{G) has a G°° density if and only ifJiE S{D). 

Proof. Necessity is obvious. For sufficiency its enough to show /i has an 
L^-density. Choose s > r so that Suguira's zeta function converges. Then 
using Theorem 13.11 and (14.11) we have 



AeD-{0} AeD-{0} 



'"iiirr ii|2 
IIIA'aIII 



m+siwr- |||2 

^. . lAI 



< sup IAP+'IPa 

< 00. □ 



s 
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We now investigate some classes of examples. We say that G V{G) is 
central if for all a E G, 

By Scliur's lemma /i is central if and only if for each vr G G there exists 
G C such that 

/i(7r) = c^4. 

Clearly m is a central measure. A standard Gaussian measure on G is 
central where we say that a measure u on G is a standard Gaussian if it can 

( R) { H) 

be reahsed as n\ in the convolution semigroup (/i^ ,t > 0) corresponding 
to Brownian motion on G (i.e. the associated Markov semigroup of operators 
is generated by |o"^A where a > 0.) For a more general notion of Gaussianity 
see e.g. [10], section 6.2. To verify centrality, take Fourier transforms of the 

1 2 

heat equation to obtain J1{tt) = e~2°" '^'^j^ ^qj. gacj^ ^ Q_ 

Following [3j we introduce a class of central probability measures on G 
which we call the CID^{G) class as they are central and are induced by 
infinitely divisible measures on M. Let p be a symmetric infinitely divisible 
probability measure on R so we have the Levy-Khintchine formula 

I e^"Xrfx) = 6""^") for all M G M 

where rj{u) = -a'^u^ + f (1 — cos{u))iy{du), 
2 Jr-{o} 

with a > and u a Levy measure, i.e. /]g_|Qj.(l A v?)h'{du) < oo (see e.g. 
[T5].) We say n G CID^{G) if there exists r] as above such that 

1 

/i(7r) = e'^^^'^^h^ for each vr G G. 

Examples of such measures are obtained by subordination [15]. So let (7/, t > 
0) be a subordinator with Bernstein function / so that for all m > 

Jo 

( B) I — 

Let (/ij , t > 0) be a Brownian convolution semigroup on G (with a = v2) so 
that for each vr G G /^((Tr) = e'^^^I-,^. then we obtain a convolution semigroup 
of measures (/if, t > 0) in CIGr{G) by 

POO 

Jo 
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for each A G B{G) and we have 



Examples (where we have taken t = 1): 

• Laplace Distribution f{u) = log(l + 

/2(7r) = il + /3\^)-'U. 

• Stable-like distribution f(u) = b°'u^{0 < a <2), 

We now apply Theorem 14.21 to present some examples of measures in the 
CIGr class which have smooth densities (and one that doesn't). 

Example 1. rj general with a ^ (i.e. non- vanishing Gaussian part) 

Using (Hi!) and (Hl2|) we obtain 

lim \Xn\m\\\ = ,lim \X\'e-^^hi 

A|— >oo |A|— >cxi 

< lim |A|'^e-'^"^4 

A|— >oo 

< N-2 lim lAl'^+fe-^l^l' =0. 

A|— ^oo 

Example 2. Stable like laws are all C°° by a similar argument. 

Example 3. The Laplace distribution is not C°°. But it is if r = 1 
(e.g. SOi3),SUi2),Spil).) 



5 Deconvolution Density Estimation 

We begin by reviewing the work of Kim and Richards in [13]. Let X, Y and 
e be G-valued random variables with Y = Xe. Here we interpret X as a 
signal, Y as the observations and e as the noise which is independent of X. 
If all three random variables have densities, then with an obvious notation 
we have fy = fx * fe- The statistical problem of interest is to estimate 
fx based on i.i.d. observations Yi, . . . ,Yn of the random variable Y. We 
assume that the matrix /^(Tr) is invertible for all n E G. Our key tool is the 



11 



empirical characteristic function fy (vr) := ^ Yl^=i ^(■^^^)- We then define 
the non-parametric density estimator (with smoothing parameters T!„ — ?■ oo 
as n — )• oo) for a G G, n G N: 

TT£G:KTr<Tn 



The noise e is said to be super-smooth of order (3 > if there exists 7 > 
and tti , 02 > such that 

||/,(7r))-i|U = 0(«:;'^^exp(7«:^)) and ||7e(7r)|U = 0« exp(-7«:^) 

as K,r — 00. For example a standard Gaussian is super-smooth with = 
(z = 1,2). For p > 0, the Sobolev space 'Hp(G) := {/ G L'^iG); \\f\\p < 00} 
where 11/11^ = E^^grf,(l + ^,)^|||/(vr)||p. 

Theorem 5.1 (Kim, Richards) If super- smooth of order P ojt-c? 1 1 /x 1 1 (g) ^ 
K /or some s > | where K > 1 then the optimal rate of convergence of fx^ 
to fx IS (log(n))"^. 

A natural question to ask is "how smooth is super-smooth?" and we 
answer this as follows: 

Proposition 5.1 /// is super-smooth then it is smooth. 

Proof. For sufficiently large /t^ and using (14.11) and (14.21) we find that 
there exists C > such that 

Ill/Will < ll/Wlloolll/.lll 

= dl\\f{n)\\^ 

< K|A,|^(1 + |A,n"2exp(-7|A,| 



2/3 N 



from which it follows that / G S{D) and the result follows by Theorem 
01 □. 
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